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1 Session S6.2: compilers and program analysis: PACT HDL: a C compiler targeting 
ASICs and FPGAs with power and performance optimizations 

Alex Jones, Debabrata Bagchi, Satrajit Pal, Xiaoyong Tang, Alok Choudhary, Prith Banerjee 
October 2002 Proceedings of the 2002 international conference on Compilers, 
architecture, and synthesis for embedded systems 

Full text available: ^ | pdf(340.93 KB) Additional Information: full citation , abstract , references , index terms 

Chip fabrication technology continues to plunge deeper into sub-micron levels requiring 
hardware designers to utilize ever-increasing amounts of logic and shorten design time. 
Toward that end, high-level languages such as C/C++ are becoming popular for hardware 
description and synthesis in order to more quickly leverage complex algorithms. Similarly, 
as logic density increases due to technology, power dissipation becomes a progressively 
more important metric of hardware design. PACT HDL, a C to ... 

Keywords: ASIC, FPGA, FSM, HDL, IP, SoC, VHDL, Verilog, compiler, high-performance, 
levelization, low-power, pipelining, synthesis 



Missing the memory wall: the case for processor/memory integration 
Ashley Saulsbury, Fong Pong, Andreas Nowatzyk 

May 1996 ACM SIGARCH Computer Architecture News , Proceedings of the 23rd 

annual international symposium on Computer architecture, volume 24 issue 2 

Additional Information: full citation , abstract , references , citings , index 



Full text available: f^ pdfn.45 MB) 

terms 

Current high performance computer systems use complex, large superscalar CPUs that 
interface to the main memory through a hierarchy of caches and interconnect systems. 
These CPU-centric designs invest a lot of power and chip area to bridge the widening gap 
between CPU and main memory speeds. Yet, many large applications do not operate well 
on these systems and are limited by the memory subsystem performance.This paper argues 
for an integrated system approach that uses less-powerful CPUs that are ... 

Piranha: a scalable architecture based on single-chip multiprocessing | 
Luiz Andre Barroso, Kourosh Gharachorloo, Robert McNamara, Andreas Nowatzyk, Shaz 
Qadeer, Barton Sano, Scott Smith, Robert Stets, Ben Verghese 

May 2000 ACM SIGARCH Computer Architecture News , Proceedings of the 27th 

annual international symposium on Computer architecture, volume 28 issue 2 
Full text available: Wi pdf(191.10 KB) Additional Information: full citation , abstract , references , citings , index 
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terms 

The microprocessor industry is currently struggling with higher development costs and 
longer design times that arise from exceedingly complex processors that are pushing the 
limits of instruction-level parallelism. Meanwhile, such designs are especially ill suited for 
important commercial applications, such as on-line transaction processing (OLTP), which 
suffer from large memory stall times and exhibit little instruction-level parallelism. Given 
that commercial applications constitute by fa ... 

Multiway FPGA partitioning by fully exploiting design hierarchy 
Wen-Jong Fang, Allen C.-H. Wu 

January 2000 ACM Transactions on Design Automation of Electronic Systems 

(TODAES), Volume 5 Issue 1 
Full text available - Wi pdf(130 36 KB) Additional Information: full citation , abstract , references , citings , index 
^ ! terms 

In this paper, we present a new integrated synthesis and partitioning method for multiple- 
FPGA applications. Our approach bridges the gap between HDL synthesis and physical 
partitioning by fully exploiting the design hierarchy. We propose a novel multiple-FPGA 
synthesis and partitioning method which is performed in three phases: (1) fine-grained 
synthesis, (2) functional-based clustering, and (3) hierarchical set-covering partitioning. 
This method first synthesizes a design specification in ... 

Keywords: fine-grained synthesis, functional clustering, multi-way partitioning, multiple- 
FPGA synthesis 



5 The sun fireplane system interconnect | 
Alan Charlesworth 

November 2001 Proceedings of the 2001 ACM/IEEE conference on Supercomputing 
(CDROM) 

i- ii* ^ •. u. ft o7 i/n\ Additional Information: full citation , abstract , references , citings , index 

Full text available: T?1 pdf(224.87 KB) a - 1 

^ terms 

System interconnect is a key determiner of the cost, performance, and reliability of large 
cache-coherent, shared-memory multiprocessors. Interconnect implementations have to 
accommodate ever greater numbers of ever faster processors. This paper describes the 
Sun™ Fireplane two-level cache-coherency protocol, and its use in the medium and large- 
sized UltraSPARC-III-based Sun Fire™ servers. 

6 STiNG: a CC-NUMA computer system for the commercial marketplace | 
Tom Lovett, Russell Clapp 

May 1996 ACM SIGARCH Computer Architecture News , Proceedings of the 23rd 

annual international symposium on Computer architecture, volume 24 issue 2 
Full text available* *Pl pdf(1.30 MB) Additional Information: full citation , abstract , references , citings , index 
^ terms 

"STiNG" is a Cache Coherent Non-Uniform Memory Access (CC-NUMA) Multiprocessor 
designed and built by Sequent Computer Systems, Inc. It combines four processor 
Symmetric Multi-processor (SMP) nodes (called Quads), using a Scalable Coherent Interface 
(SCI) based coherent interconnect. The Quads are based on the Intel P6 processor and the 
external bus it defines. In addition to 4 P6 processors, each Quad may contain up to 4 
GBytes of system memory, 2 Peripheral Component Interface (PCI) busses for ... 
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Promises and reality: Server I/O networks past, present, and future 
Renato John Recio 

August 2003 Proceedings of the ACM SIGCOMM workshop on Network-I/O 
convergence: experience, lessons, implications 
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Full text available: pdf(225.62 KB) Additional Information: full citation , abstract , references , index terms 

Enterprise and technical customers place a diverse set of requirements on server I/O 
networks. In the past, no single network type has been able to satisfy all of these 
requirements. As a result several fabric types evolved and several interconnects emerged to 
satisfy a subset of the requirements. Recently several technologies have emerged that 
enable a single interconnect to be used as more than one fabric type. This paper will 
describe the requirements customers place on server I/O networks; t ... 

Keywords: 10 GigE, Cluster, Cluster Networks, Gigabit Ethernet, I/O Expansion Network, 
IOEN, InfiniBand, LAN, PCI, PCI Express, RDMA, RNIC, SAN, Socket Extensions, TOE, 
iONIC, iSCSI, iSER 



8 ASIC microprocessors 
M. J. Flynn, R. I. Winner 

August 1989 ACM SIGMICRO Newsletter , Proceedings of the 22nd annual workshop on 
Microprogramming and microarchitecture, volume 20 issue 3 

Additional Information: full citation , abstract , references , citings , index 



Full text available: f a pdf(792.04 KB) 

lMr terms 

ASIC microprocessors are becoming an important technology for the control of complex 
("embedded") systems. The advantage of such microprocessors is that they can be tailored 
to the application. This tailoring is quite non-intuitive and optimization is a complex 
process. Tools such as the Architect's Workbench (AWB) have been developed to assist in 
this optimization. An example study shows a more than two to one advantage of such 
assisted analysis. 

9 The design of RPM: an FPGA-based multiprocessor emulator 

Koray Oner, Luiz A. Barroso, Sasan Iman, Jaeheon Jeong, Krishnan Ramamurthy, Michel 
Dubois 

February 1995 Proceedings of the 1995 ACM third international symposium on Field- 
programmable gate arrays 

Additional Information: full citation , abstract , references , citings , index 



Full text available: m pdf(54.01 KB) 

^ terms 

Recent advances in Field-Programmable Gate Arrays (FPGA) and programmable 
interconnects have made it possible to build efficient hardware emulation engines. In 
addition, improvements in Computer-Aided Design (CAD) tools, mainly in synthesis tools, 
greatly simplify the design of large circuits. The RPM (Rapid Prototype Engine for 
Multiprocessors) Project leverages these two technological advances. Its goal is to develop 
a common hardware platform for th ... 

Keywords: field-programmable gate arrays, logic emulation, message-passing 
multicomputers, rapid prototyping, shared-memory multiprocessors 



10 An overview of the BlueGene/L Supercomputer 

NR Adiga, G Almasi, GS Almasi, Y Aridor, R Barik, D Beece, R Bellofatto, G Bhanot, R Bickford, 
M Blumrich, AA Bright, J Brunheroto, C Ca§caval, J Castanos, W Chan, L Ceze, P Coteus, S 
Chatterjee, D Chen, G Chiu, TM Cipolla, P Crumley, KM Desai, A Deutsch, T Domany, MB 
Dombrowa, W Donath, M Eleftheriou, C Erway, J Esch, B Fitch, J Gagliano, A Gara, R Garg, R 
Germain, ME Giampapa, B Gopalsamy, J Gunnels, M Gupta, F Gustavson, S Hall, RA Haring, D 
Heidel, P Heidelberger, LM Herger, D Hoenicke, RD Jackson, T Jamal-Eddine, GV Kopcsay, E 
Krevat, MP Kurhekar, AP Lanzetta, D Lieber, LK Liu, M Lu, M Mendell, A Misra, Y Moatti, L Mok, 
JE Moreira, BJ Nathanson, M Newton, M Ohmacht, A Oliner, V Pandit, RB Pudota, R Rand, R 
Regan, B Rubin, A Ruehli, S Rus, RK Sahoo, A Sanomiya, E Schenfeld, M Sharma, E Shmueli, S 
Singh, P Song, V Srinivasan, BD Steinmacher-Burow, K Strauss, C Surovic, R Swetz, T Takken, 
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RB Tremaine, M Tsao, AR Umamaheshwaran, P Verma, P Vranas, TJC Ward, M Wazlowski, W 
Barrett, C Engel, B Drehmel, B Hilgart, D Hill, F Kasemkhani, D Krolak, CT Li, T Liebsch, J 
Marcelia, A Muff, A Okomo, M Rouse, A Schram, M Tubbs, G Ulsh, C Wait, J Wittrup, M Bae, K 
Dockser, L Kissel, MK Seager, JS Vetter, K Yates 

November 2002 Proceedings of the 2002 ACM/IEEE conference on Supercomputing 

Additional Information: full citation , abstract , references , ci tings , index 



Full text available: W pdf(357.61 KB) 

t^ 5 — 1 terms 

This paper gives an overview of the BlueGene/L Supercomputer. This is a jointly funded 
research partnership between IBM and the Lawrence Livermore National Laboratory as part 
of the United States Department of Energy ASCI Advanced Architecture Research Program. 
Application performance and scaling studies have recently been initiated with partners at a 
number of academic and government institutions, including the San Diego Supercomputer 
Center and the California Institute of Technology. This mass ... 

11 Benchmarks for layout synthesis — evolution and current status 
Krzysztof Kozmiriski 

June 1991 Proceedings of the 28th conference on ACM/IEEE design automation 

Full text available: f£ | pdf(679.23 KB) Additional Information: full citation , references , citings , index terms 



12 Efficient management of memory hierarchies in embedded DRAM systems 
Ashley Saulsbury, Su-Jaen Huang, Fredrik Dahlgren 

May 1999 Proceedings of the 13th international conference on Supercomputing 

Full text available: If?) pdf(1.57 MB) Additional Information: full citation , references , index terms 



Keywords: COMA, DRAM, cache, latency, memory hierarchy, processor 



13 Performance analysis of embedded software using implicit path enumeration 
Yau-Tsun Steven Li, Sharad Malik 

November 1995 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN 1995 

workshop on Languages, compilers, & tools for real-time systems, 

Volume 30 Issue 11 

Additional Information: full citation , abstract , references , citings , index 



Full text available: W pdf(987.85 KB) 

terms 

Embedded computer systems are characterized by the presence of a processor running 
application specific dedicated software. A large number of these systems must satisfy real- 
time constraints. This paper examines the problem of determining the extreme (best and 
worst) case bounds on the running time of a given program on a given processor. This has 
several applications in the design of embedded systems with real-time constraints. An 
important aspect of this problem is determining which paths in t ... 

14 Technology mapping and retargeting for field-programmable analog arrays 
Sree Ganesan, Ranga Vemuri 

January 2000 Proceedings of the conference on Design, automation and test in Europe 

Full text available: P pdf(137.04 KB) 

Additional Information: full citation , references , citings , index terms 

Publisher Site 
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R. K. Gupta, C. N. Coelho, G. De Micheli 

July 1992 Proceedings of the 29th ACM/IEEE conference on Design automation 

Full text available: ^ pdf(789.92 KB) Additional Information: full citation , references , ci tings , index terms 



16 Optimal clock period FPGA technology mapping for sequential circuits 
Peichen Pan, C. L. Liu 

July 1998 ACM Transactions on Design Automation of Electronic Systems (TODAES), 

Volume 3 Issue 3 

Full text available: *P ) pdf(252 82 KB) Additional Information: full citation , abstract , references , citings , index 
^ terms 

We study the technology mapping problem for sequential circuits for look-up table (LUT) 
based field programmable gate arrays (FPGAs). Existing approaches to the problem simply 
remove the flip-flops (FFs), then map the remaining combinational logic, and finally put the 
FFs back. These approaches ignore the sequential nature of a circuit and assume the 
positions of the FFs are fixed. However, FFs in a sequential circuit can be reposistioned by a 
functionality-preserving transformation called ... 

Keywords: FPGAs, clock period, field-programmable gate arrays, logic replication, look-up 
tables, retiming, sequential synthesis, technology mapping 



17 Technology mapping of sequential circuits for LUT-based FPGAs for performance 
Peichen Pan, C. L. Liu 

February 1996 Proceedings of the 1996 ACM fourth international symposium on Field- 
programmable gate arrays 

Full text available: ^ 1 pdf(232.75 KB) Additional Information: full citation , references , citings , index terms 



Keywords: FPGAs, clock period, logic replication, look-up table, retiming, sequential 
circuits, technology mapping 



18 High-confidence design for security: don't trust — verify 
Shiu-Kai Chin 

July 1999 Communications of the ACM, volume 42 issue 7 
Full text available: 1 f| pdf(180.62 KB) 



html(27.73 KB) Additional Information: full citation , references , citings , index terms , review 



19 S-connect: from networks of workstations to supercomputer performance 
Andreas G. Nowatzyk, Michael C. Browne, Edmund J. Kelly, Michael Parkin 
May 1995 ACM SIGARCH Computer Architecture News , Proceedings of the 22nd 

annual international symposium on Computer architecture, volume 23 issue 2 
Full text available* pdfd.38 MB) Additional Information: full citation , abstract , references , citings , index 
^ terms 

S-Connect is a new high speed, scalable interconnect system that has been developed to 
support networks of workstations to efficiently share computing resources. It uses off-the- 
shelf CMOS technology to directly drive fiber-optic systems at speeds greater than 1 
Gbit/sec and can realize bisection bandwidths comparable to high-end MPP systems while 
being >10x more cost-effective. S-Connect systems do not rely on centralized switches, 
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but rather are composed of adaptive, topology independen ... 

20 Design strategies for controlling standby leakage: An ASIC design methodology with 

predictably low leakage, using leakage-immune standard cells 
Nikhil Jayakumar, Sunil P. Khatri 

August 2003 Proceedings of the 2003 international symposium on Low power 
electronics and design 

Full text available: ^pdf( 158. 00 KB) Additional Information: full citation , abstract , references , index terms 

In this paper we introduce a low-leakage standard cell based ASIC design methodology 
which is based on the use of modified standard cells. These cells are designed to consume 
extremely low and predictable leakage currents in standby mode. For each cell in a 
standard cell library, we design two low-leakage variants of the cell. If the inputs of a cell 
during the standby mode of operation are such that the output has a high value, we 
minimize the leakage in the pull-down network, and vice v ... 

Keywords: MTCMOS, leakage current, standard cells, standby current 
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